Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data
نویسندگان
چکیده
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. In contrast to existing storage and compute clouds, Sector can manage data not only within a data center, but also across geographically distributed data centers. Similarly, the Sphere compute cloud supports User Defined Functions (UDF) over data both within a data center and across data centers. As a special case, MapReduce style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort Benchmark. In these studies, Sector is about twice as fast as Hadoop. Sector/Sphere is open source.
منابع مشابه
Sector and Sphere: the design and implementation of a high-performance data cloud
Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملSecurity-Constrained Unit Commitment Considering Large-Scale Compressed Air Energy Storage (CAES) Integrated With Wind Power Generation
Environmental concerns and depletion of nonrenewable resources has made great interest towards renewable energy resources. Cleanness and high potential are factors that caused fast growth of wind energy. However, the stochastic nature of wind energy makes the presence of energy storage systems (ESS) in wind integrated power systems, inevitable. Due to capability of being used in large-scale sys...
متن کاملLogistics performance of European Union markets: Towards the development of entrepreneurship in the transport and storage sector
The markets globalization is one of the factors creating conditions for the development of entrepreneurship. Entrepreneurship does not have one generally accepted definition. Most often, entrepreneurship is perceived as the ability to increase the number of enterprises. Entrepreneurship can be understood as the potential to identify and use development opportunities regardless of own resources....
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0809.1181 شماره
صفحات -
تاریخ انتشار 2008